-
Notifications
You must be signed in to change notification settings - Fork 290
Avx512f #912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avx512f #912
Conversation
merge from base
merge base
r? @Amanieu (rust_highfive has picked a reviewer for you, use r? to override) |
crates/core_arch/src/x86/avx512f.rs
Outdated
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=knot_mask16&expand=3233) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(all(test, not(target_os = "macos")), assert_instr(not))] // generate normal not code instead of knotw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you special-casing macos here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In osX, it does not generate any "not" or "xor" instructions. It generates "vmovaps" tested in CI and osX with clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you post the fully assembly output you are getting on macOS? I find it extremely strange that it is behaving differently from Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
osX Rust
#[target_feature(enable = "avx512f")]
unsafe fn avx512() {
let c = _mm512_knot(0b00000000_00000000);
}
fn main() {
unsafe { avx512(); }
}
.loc 6 10538 0
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movw %di, -4(%rbp)
Ltmp21:
.loc 6 10539 15 prologue_end
movzwl %di, %edi
movl $65535, %esi
callq __ZN9core_arch9core_arch3x867avx512f11_mm512_kxor17hbdf54d0540836621E
movw %ax, -6(%rbp)
.loc 6 0 15 is_stmt 0
movw -6(%rbp), %ax
.loc 6 10539 5
movw %ax, -2(%rbp)
movw -2(%rbp), %ax
movw %ax, -8(%rbp)
.loc 6 0 5
movw -8(%rbp), %ax
.loc 6 10540 2 is_stmt 1
addq $16, %rsp
popq %rbp
retq
Clang on osX:
|
For the We could then remove the mostly useless |
|
|
|
You can change it to |
crates/core_arch/src/x86/avx512f.rs
Outdated
@@ -10524,7 +10524,7 @@ pub unsafe fn _mm512_kxor(a: __mmask16, b: __mmask16) -> __mmask16 { | |||
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=knot_mask16&expand=3233) | |||
#[inline] | |||
#[target_feature(enable = "avx512f")] | |||
#[cfg_attr(test, assert_instr(not))] // generate normal not code instead of knotw | |||
//#[cfg_attr(test, assert_instr(xor))] // generate normal not code instead of knotw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't leave commented-out code. Either keep the line with xor
or delete it entirely.
For osX, it uses xor. For Linux, it uses not. So, I remove assert_instr test for _mm512_not? |
Yes, in that case just remove the |
knot, kandn, kxornd, kmov
permute: epi32
extractf32x4_ps not mask and maskz
permute_f32x4
permute_f64x2
permute_i32x4
permute_i64x2
moveldup_ps
movehdup_ps
movedup_pd